# Chinese Visual Question Answering
Qwen2.5 VL 7B Instruct GGUF
Apache-2.0
Qwen2.5-VL-7B-Instruct is a multimodal vision-language model that supports image-text generation tasks.
Image-to-Text English
Q
samgreen
5,052
9
Aria Sequential Mlp Bnb Nf4
Apache-2.0
A BitsAndBytes NF4 quantized version based on Aria-sequential_mlp, suitable for image-to-text tasks with approximately 15.5 GB VRAM requirement.
Image-to-Text
Transformers

A
leon-se
76
11
Vit Gpt2 Image Chinese Captioning
MIT
This model uses ViT for image encoding and GPT-2 for decoding, supporting Chinese image caption generation.
Image-to-Text
Transformers Chinese

V
yuanzhoulvpi
22
6
Featured Recommended AI Models